Search CORE

177 research outputs found

Using Deep Networks for Drone Detection

Author: Aker Cemal
Kalkan Sinan
Publication venue
Publication date: 18/06/2017
Field of study

Drone detection is the problem of finding the smallest rectangle that encloses the drone(s) in a video sequence. In this study, we propose a solution using an end-to-end object detection model based on convolutional neural networks. To solve the scarce data problem for training the network, we propose an algorithm for creating an extensive artificial dataset by combining background-subtracted real images. With this approach, we can achieve precision and recall values both of which are high at the same time.Comment: To appear in International Workshop on Small-Drone Surveillance, Detection and Counteraction Techniques organised within AVSS 201

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments

Author: Doğan Fethiye Irmak
Kalkan Sinan
Leite Iolanda
Publication venue
Publication date: 01/01/2019
Field of study

Referring to objects in a natural and unambiguous manner is crucial for effective human-robot interaction. Previous research on learning-based referring expressions has focused primarily on comprehension tasks, while generating referring expressions is still mostly limited to rule-based methods. In this work, we propose a two-stage approach that relies on deep learning for estimating spatial relations to describe an object naturally and unambiguously with a referring expression. We compare our method to the state of the art algorithm in ambiguous environments (e.g., environments that include very similar objects with similar relationships). We show that our method generates referring expressions that people find to be more accurate (

\sim

30% better) and would prefer to use (

\sim

32% more often).Comment: International Conference on Intelligent Robots and Systems (IROS 2019), Demo 1: Finding the described object (https://youtu.be/BE6-F6chW0w), Demo 2: Referring to the pointed object (https://youtu.be/nmmv6JUpy8M), Supplementary Video (https://youtu.be/sFjBa_MHS98

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

A Deep Incremental Boltzmann Machine for Modeling Context in Robots

Author: Doğan Fethiye Irmak
Kalkan Sinan
Çelikkanat Hande
Publication venue
Publication date: 02/03/2018
Field of study

Context is an essential capability for robots that are to be as adaptive as possible in challenging environments. Although there are many context modeling efforts, they assume a fixed structure and number of contexts. In this paper, we propose an incremental deep model that extends Restricted Boltzmann Machines. Our model gets one scene at a time, and gradually extends the contextual model when necessary, either by adding a new context or a new context layer to form a hierarchy. We show on a scene classification benchmark that our method converges to a good estimate of the contexts of the scenes, and performs better or on-par on several tasks compared to other incremental models or non-incremental models.Comment: 6 pages, 5 figures, International Conference on Robotics and Automation (ICRA 2018

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

COSMO: Contextualized Scene Modeling with Boltzmann Machines

Author: Bozcan Ilker
Kalkan Sinan
Publication venue
Publication date: 19/12/2018
Field of study

Scene modeling is very crucial for robots that need to perceive, reason about and manipulate the objects in their environments. In this paper, we adapt and extend Boltzmann Machines (BMs) for contextualized scene modeling. Although there are many models on the subject, ours is the first to bring together objects, relations, and affordances in a highly-capable generative model. For this end, we introduce a hybrid version of BMs where relations and affordances are introduced with shared, tri-way connections into the model. Moreover, we contribute a dataset for relation estimation and modeling studies. We evaluate our method in comparison with several baselines on object estimation, out-of-context object detection, relation estimation, and affordance estimation tasks. Moreover, to illustrate the generative capability of the model, we show several example scenes that the model is able to generate.Comment: 40 pages, 15 figures, 9 tables, accepted to the Robotics and Autonomous Systems (RAS) special issue on Semantic Policy and Action Representations for Autonomous Robots (SPAR

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

CINet: A Learning Based Approach to Incremental Context Modeling in Robots

Author: Bozcan İlker
Doğan Fethiye Irmak
Kalkan Sinan
Çelik Mehmet
Publication venue
Publication date: 29/07/2018
Field of study

There have been several attempts at modeling context in robots. However, either these attempts assume a fixed number of contexts or use a rule-based approach to determine when to increment the number of contexts. In this paper, we pose the task of when to increment as a learning problem, which we solve using a Recurrent Neural Network. We show that the network successfully (with 98\% testing accuracy) learns to predict when to increment, and demonstrate, in a scene modeling problem (where the correct number of contexts is not known), that the robot increments the number of contexts in an expected manner (i.e., the entropy of the system is reduced). We also present how the incremental model can be used for various scene reasoning tasks.Comment: The first two authors have contributed equally, 6 pages, 8 figures, International Conference on Intelligent Robots (IROS 2018

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks

Author: Churamani Nikhil
Gunes Hatice
Kalkan Sinan
Publication venue
Publication date: 03/03/2021
Field of study

Most state-of-the-art approaches for Facial Action Unit (AU) detection rely upon evaluating facial expressions from static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are usually more subtle and evolve in a temporal manner requiring AU detection models to learn spatial as well as temporal information. In this paper, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. For this purpose, we propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU detection using both frame and sequence-level features. While at the frame-level the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus more on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on the commonly used BP4D and GFT benchmark datasets obtaining state-of-the-art results on both the datasets.Comment: Updated Figure 6 and the Acknowledgements. Corrected typos. 11 pages, 6 figures, 3 table

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

MAGiC: A multimodal framework for analysing gaze in dyadic communication

Author: Acarturk Cengiz
Arslan Aydin Ülkü
Kalkan Sinan
Publication venue: University of Bern
Publication date: 01/01/2018
Field of study

The analysis of dynamic scenes has been a challenging domain in eye tracking research. This study presents a framework, named MAGiC, for analyzing gaze contact and gaze aversion in face-to-face communication. MAGiC provides an environment that is able to detect and track the conversation partner’s face automatically, overlay gaze data on top of the face video, and incorporate speech by means of speech-act annotation. Specifically, MAGiC integrates eye tracking data for gaze, audio data for speech segmentation, and video data for face tracking. MAGiC is an open source framework and its usage is demonstrated via publicly available video content and wiki pages. We explored the capabilities of MAGiC through a pilot study and showed that it facilitates the analysis of dynamic gaze data by reducing the annotation effort and the time spent for manual analysis of video data

Journal of Eye Movement Research

BOP Serials

OpenMETU (Middle East Technical University)

THE EFFECT OF MOVEMENT AND PLAY-BASED MUSIC EDUCATION ON MUSICAL SKILLS OF STUDENTS AFFECTED BY MENTAL DISABILITY

Author: Güdek Bahar
Kalkan Sinan
Öziskender Flinn Gülin
Publication venue: IOJET
Publication date: 01/10/2022
Field of study

The research aims to determine the effect of movement and game-based music education on the musical (exercising dynamics, playing the body, singing) skills of students with moderate intellectual disability. Within the framework of this purpose, it was aimed to improve the musical dynamics application skills, body playing skills and singing skills of students with special needs. In the study, the inter-behavioral multiple probe model, which is one of the single-subject experimental designs, was used. A student affected by moderate intellectual disability participated in the study. The findings showed that the effects of Movement and Play Based (MPBME) music education on the musical dynamics practice skills, body playing skills and singing skills of students with moderate intellectual disability were statistically significant and positive. He has shown that he has developed his skills and that he can demonstrate these skills with different applications and that his skills continue

IOJET - International Online Journal of Education and Teaching